05_1_A_Benchmark_Methods_FittedVals

References

  1. Hyndman, R.J., & Athanasopoulos, G., 2021. Forecasting: principles and practice. 3rd edition.
  2. Fable package documentation
  • https://fable.tidyverts.org/index.html
  • https://fable.tidyverts.org/articles/fable.html

Libraries

library(fpp3)

Introduction

In this session we are going to:

  • Explore some benchmark models. These are models that are simple in nature, but yet will serve as a measure of performance to which we will compare more complex models.
    • If more complex models do not outperform this benchmark models, their complexity will not be justified.
  • Introduce the concept of fitted values. and compare it to forecasts.
  • Practice the basic steps behind fitting models and forecasting with the library fable.
  • Do some exercises.

For the most of the simple examples in this notebook we will work with this time series studying the production of bricks between 1970 and 2004:

bricks <- aus_production %>%
  filter_index("1970 Q1" ~ "2004 Q4") %>%
  select(Bricks)

head(bricks)
# A tsibble: 6 x 2 [1Q]
  Bricks Quarter
   <dbl>   <qtr>
1    386 1970 Q1
2    428 1970 Q2
3    434 1970 Q3
4    417 1970 Q4
5    385 1971 Q1
6    433 1971 Q2

Fitted values vs forecasts

Before using a model, we will need to fit that model (classic math terminology). This process is also known as training the model (machine learning terminology). This means that we will use data to produce estimates of the parameters of the model. We will call this data training data.

Note a few important technical details:

  • I talk about estimates of the parameters because we are estimating these parameters from a particular sample or in our case, from a particular realisation of the time series.
    • Notation:
      • \(\beta\) - true value of the parameter.
      • \(\hat{\beta}\) - estimate of the parameter.
    • Many times we will abuse this notation and write \(\beta\) instead of \(\hat{\beta}\), but know that, when fitting a model, we are producing estimates of the parameters.
  • If we fitted the model to another sample or to another realization of the time series (think of this as repeating an experiment), we would produce different estimates of those parameters.
  • Being estimates, those parameters have associated standard errors.

Fitted values - example with linear regression

Now let us resort to your knowledge of linear regression to ilustrate the concept of fitted values. When fitting a linear regression model to predict a variable \(y\) based on the values of a predictor \(x\), we take a sample of data (our training data) and produce estimates of the model parameters.

Once these estimates \(\beta_0\) and \(\beta_1\) have been produced, we can compute the values of \(y\) predicted by the model for the specific values of \(x\) of the training data. The values of \(y\) returned by the model would be the fitted values \(\hat{y_t}\) of the linear regression model.

At each point there is a difference between the actual value of the training data \(y\) and the fitted value \(\hat{y_t}\). This is what we call the random error. The word error here should not be understood as mistake, but rather as anything that may affect \(y_t\) other than \(x_t\)

\[ y_t - \hat{y_t} = \varepsilon_t \]

The image below illustrates this:

Time Series case - Fitted values and forecasts

The following figure will serve to clarify the difference between the concepts of fitted values and forecasts. This figure summarizes the forecasting cast and is very relevant for both this and the second part of the course:

In this figure:

  • The black dots represent the available observations at the time of forecasting \(y_t\). These extend fron \(t=1\) to \(t=T\)
  • The blue dots represent the fitted values, values returned by the model we have fitted in the training region. These are defined as one step ahead forecasts on the training data. \(y_{t|t-1}\). More on this later. The observations minus the fitted values are the residuals of the model.
  • The green dots represent the forecasts beyond the available data, at a horizon h. \(y_{T+h|T}\). These are the forecasts produced by whichever model we decide to use.
  • The orange dots represent future observations not available at the time of forecasting. Once these are available, we could compute the **forecast errors*.

More on this concepts below.

Forecasts - \(\hat{y}_{T+h|T}\)

When fitting a time series model to a given realization of a time series - that is, to our training data - the goal is to produce estimates of the model parameters. Once the model is fitted, it will allow us to produce forecasts beyond the training data. Let us assume that:

  • The training data starts at \(t=1\) and ends at \(t=T\).
  • The forecasts are produced for a horizon \(h\).

The notation for the forecast at \(T+h\) given everything that has occurred up tu time \(T\) (end of the training data) is:

\[ \begin{align*} \text{Forecast at time T+h based on what has happened up to time T:} && \hat{y}_{T+h|T} \end{align*} \]

Fitted values - \(\hat{y}_{t|t-1}\)

Fitted values are one-step in-sample (i.e., in the time region of the training data) “forecasts” (we will see some exceptions to this definition):

  • one-step ahead means that the forecast only extend one time step into the future (\(h=1\)).
    • To produce the one-step ahead forecast at time \(t\), only information up to \(t-1\) is considered.
  • in-sample means that the forecasts belong to the time-region of the training data (training data = the time series to which we fit the model).

\[ \begin{align*} \text{Fitted value at time t based on what has happened up to time t-1:} && \hat{y}_{t|t-1} \end{align*} \]

Given their definition as one-step ahead forecasts, we denote the fitted value at time \(t\) as \(\hat{y}_{t|t-1}\), meaning the forecast is based on observations up to \(t-1\), i.e. {\(y_{1},\dots,y_{t-1}\)}. We will abuse notation and sometimes refer to the fitted values simply as \(\hat{y_t}\) instead of \(\hat{y}_{t|t-1}\).

In later sessions we will see that, for many models, the equation for the fitted values will be obtained from the equation for a forecast at a horizon \(h\) as follows:

\[ \begin{align*} \text{Forecast at a horizon h given information up to t=T}: && \hat{y}_{T+h|T} \\ \text{Particularize at h = 1}: && \hat{y}_{T+1|T} \\ \text{Particularize equation at t instead of T+1}: && \hat{y}_{t|t-1} \end{align*} \]

The last change in the equation (particularize at \(t\) instead of \(T+1\)) is simply substituting \(T\) by \(t-1\). That is, we are using the change of variable \(t=T+1 \rightarrow T=t-1\). Hence:

\[ \begin{equation} \hat{y}_{T+1|T} \underset{\substack{\uparrow \\ \text{C.V:} \\ T = t-1}}{=} \hat{y}_{t-1+1|t-1} = \hat{y}_{t|t-1} \end{equation} \]

The graph below shows an example of all these concepts:

  • The black line represents the bricks time series.
  • The dashed red line represents the fitted values resulting from fitting the Naive model to this time series. As we will see later, these are one-step ahead forecasts.
  • The blue line represents the forecasts given by the Naive model fitted to the data. These extend several time steps into the future.
    • Confidence regions of 80 and 95% are also depicted.

Forecasts compared to fitted values - summary

The table below summarizes the comparison between the concepts of fitted values vs. forecasts that we explained in the foregoing section:

Property Forecasts - \(\hat{y}_{T+h|T}\) Fitted values - \(\hat{y}_{t|t-1}\)
Definition Forecast at \(T+h\) given everything that occurred up to \(T\) Fitted value at \(t\) given everything that occurred up to \(t-1\)
Range Extend beyond the range of time of the training data (beyond \(T\)) Limited to the time range corresponding to the training data (\(t \leq T\))
Steps Can be multi-step, that is, for a horizon \(h>1\) Are one-step ahead (\(h=1\))

Importantly sometimes fitted values will not be strictly one-step ahead forecasts (for example in the Mean model).

Point Forecast and Forecast Distribution

To take a statistical approach to the forecasting task and be able to account for the uncertainty, it is best to cosider a time series as a collection of random variables indexed over time.

  • Past values of the time series {\({y_1, y_2, \dots, y_T}\)} are random variables that have been realized, that is, they have taken a specific outcome, a number, they have collapsed to a single value.
  • Future values of the time series, the forecasts {\(\hat{y}_{T+h|T}\)}, are random variables that have yet to be realized.

Let us look at the following example of the international arrivals to australia, in which a model has been fitted and some forecasts have been produced:

If we focus on the specific forecast region, each point \(T+h\) in the future has an associated forecast distribution:

More technically, at each point \(T+h\) in the future corresponding to the forecast region we have:

  • \(y_{T+h|T}\), the random variable at \(T+h\) given the information available up to time \(T\)
  • A forecast distribution associated to the random variable \(y_{T+h|T}\) that describes the probability of occurrence of specific values.
  • A point forecast \(\hat{y}_{T+h|T}\) that is the mean of the forecast distribution (the median is also used sometimes).

The most important concepts signaled here are:

  • Forecast distribution as the set of values \(y_{T+h|T}\) could take and the description of the associated probabilities.
  • Point forecast \(\hat{y}_{T+h|T}\) as the mean value of the forecast distribution.

As a side note, a colection of random variables {\(y_t\)} indexed by \(t\) is a type of stochastic process.

Residuals

The residuals are the difference between the observations and the corresponding fitted values. At a specific point \(t\), the value of the residuals is:

\[ e_{t} = y_{t}-\hat{y}_{t}. \]

  • Innovation residuals: these are the residuals in the transformed domain, only relevant if a transformation has been used. Otherwise innovation residuals and residuals will match.
    • For example, when using logarithms of the data \(w_t = log(y_t)\), the innovation residuals are given by \(w_t - \hat{w_t}\) and the regular residuals are given by \(y_t - \hat{y_t}\).

We will see later in this notebook that the residuals can be extracted from a fitted model in the fpp3 library using the function augment().

Mean model

Forecasts of all future values are equal to the average of the historical data.

\[ \hat{y}_{T+h|T} = \bar{y} = (y_1 + \dots + y_{T})/T. \]

Defining and training the model (generating the fitted values or estimates)

fit <- 
  bricks %>% 
  model(
    mean = MEAN(Bricks)
    )
fit
# A mable: 1 x 1
     mean
  <model>
1  <MEAN>

This particular mable (model table) has 1 row (1 time series fitted) and 1 column (1 model)

Fitted values

The function augment() allows us to extract some relevant information contained within the model objects generated by the fpp3 library. Among other things, fitted values and residuals. Note that the model also contains the original time series (column Bricks).

fitted_vals <- 
  fit %>% 
  augment()

head(fitted_vals)
# A tsibble: 6 x 6 [1Q]
# Key:       .model [1]
  .model Quarter Bricks .fitted .resid .innov
  <chr>    <qtr>  <dbl>   <dbl>  <dbl>  <dbl>
1 mean   1970 Q1    386    451.  -64.9  -64.9
2 mean   1970 Q2    428    451.  -22.9  -22.9
3 mean   1970 Q3    434    451.  -16.9  -16.9
4 mean   1970 Q4    417    451.  -33.9  -33.9
5 mean   1971 Q1    385    451.  -65.9  -65.9
6 mean   1971 Q2    433    451.  -17.9  -17.9

The mean model is a good example in which fitted values are not one-step ahead forecasts on the training data, because it actually uses data beyond the point at which the fitted value is computed to produce the fitted values. That is, it uses the entirety of the training data to compute the mean.

fitted_vals %>% 
  filter(.model == "mean") %>%
  autoplot(Bricks, colour = "gray") +
  geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")

Forecasts

To produce forecasts in the fable library we use the forecast() function on the fitted model. We specify an argument: h, the number of time steps we wish to forecast.

# produce forecasts for 8 timesteps
forecasts <- fit %>% forecast(h = 8)
forecasts
# A fable: 8 x 4 [1Q]
# Key:     .model [1]
  .model Quarter       Bricks .mean
  <chr>    <qtr>       <dist> <dbl>
1 mean   2005 Q1 N(451, 4022)  451.
2 mean   2005 Q2 N(451, 4022)  451.
3 mean   2005 Q3 N(451, 4022)  451.
4 mean   2005 Q4 N(451, 4022)  451.
5 mean   2006 Q1 N(451, 4022)  451.
6 mean   2006 Q2 N(451, 4022)  451.
7 mean   2006 Q3 N(451, 4022)  451.
8 mean   2006 Q4 N(451, 4022)  451.

The output is a so called fable (forecast table). Some comments about the result:

  • The column .mean represents the forecast for each specific horizon \(h=1, 2, \dots\). That is, the point forecast \(\hat{y}_{T+h|T}\). The notation .mean is used because it refers to the mean of the forecast distribution.
  • The colun Bricks in the result now contains a distribution object detailing the forecast distribution at each forecast horizon. We will see how to handle these objects later in the subject.

Important: to depict the forecasts along with the time series use autoplot on the fable object (in our case the variable forecasts) and then specify the original tsibble within autoplot (in this case bricks). This will result in the forecasts being depicted along the original time series:

# TO depict the forecasts and the original series use autoplot() with the 
forecasts %>% 
  
  # Depicts the original time series and the forecasts
  autoplot(bricks, level = FALSE)
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.

If you now wish to add the fitted values, you may use geom_line() to add another layer to the graph:

# TO depict the forecasts and the original series use autoplot() with the 
forecasts %>% 
  
  # Depicts the original time series and the forecasts
  autoplot(bricks, level = FALSE) +
    
  # Overlays the fitted values
  geom_line(data = fitted_vals, aes(y = .fitted), colour = "blue", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.

Number of parameters

Using the function tidy() on a fitted model, we can get the number of parameters of the model

tidy(fit)
# A tibble: 1 × 6
  .model term  estimate std.error statistic   p.value
  <chr>  <chr>    <dbl>     <dbl>     <dbl>     <dbl>
1 mean   mean      451.      5.34      84.4 2.58e-121

In this case there is only one parameter, the mean of the sample.

Naïve model

The Naïve model sets all forecasts to be the value of the last observation. Let us use the aus_exports dataset

aus_exports <- filter(global_economy, Country == "Australia")
autoplot(aus_exports, Exports)

Defining and training the model (generating the fitted values or estimates)

fit <- aus_exports %>% model(Naive = NAIVE(Exports))
fit
# A mable: 1 x 2
# Key:     Country [1]
  Country     Naive
  <fct>     <model>
1 Australia <NAIVE>

Again the mable has 1 row (we are only fitting the model to 1 time series) and one column for the model (we only fitted one model).

Fitted values

# Extract fitted values and inspect table
fitted_vals <- fit %>% augment()
head(fitted_vals) 
# A tsibble: 6 x 7 [1Y]
# Key:       Country, .model [1]
  Country   .model  Year Exports .fitted .resid .innov
  <fct>     <chr>  <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
1 Australia Naive   1960    13.0    NA   NA     NA    
2 Australia Naive   1961    12.4    13.0 -0.591 -0.591
3 Australia Naive   1962    13.9    12.4  1.54   1.54 
4 Australia Naive   1963    13.0    13.9 -0.937 -0.937
5 Australia Naive   1964    14.9    13.0  1.93   1.93 
6 Australia Naive   1965    13.2    14.9 -1.72  -1.72 
# Print fitted values along with the original series
fitted_vals %>% 
  filter(.model == "Naive") %>%
  autoplot(Exports, colour = "gray") +
  geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")
Warning: Removed 1 row containing missing values (`geom_line()`).

fit <- aus_exports %>% model(Naive = NAIVE(Exports))
fit
# A mable: 1 x 2
# Key:     Country [1]
  Country     Naive
  <fct>     <model>
1 Australia <NAIVE>

The above graph looks very similar to the first lag of the time series! In fact, that is exactly what it is.

The naïve model is computing the fitted values by calculating a one step ahead forecast from the previous observation. Keep this in mind because we will often refer to the fitted values as one-step ahead forecasts on the training data. In this case we are not yet splitting our time series prior to fitting the model, so the entire time series is our training data. We fitted the model to our entire time series when executing the command below

fit <- aus_exports %>% model(Naive = NAIVE(Exports))
fit

Forecasts

# produce forecasts for 8 timesteps
forecasts <- fit %>% forecast(h = 8)
forecasts
# A fable: 8 x 5 [1Y]
# Key:     Country, .model [1]
  Country   .model  Year    Exports .mean
  <fct>     <chr>  <dbl>     <dist> <dbl>
1 Australia Naive   2018 N(21, 1.5)  21.3
2 Australia Naive   2019 N(21, 3.1)  21.3
3 Australia Naive   2020 N(21, 4.6)  21.3
4 Australia Naive   2021 N(21, 6.1)  21.3
5 Australia Naive   2022 N(21, 7.6)  21.3
6 Australia Naive   2023 N(21, 9.2)  21.3
7 Australia Naive   2024  N(21, 11)  21.3
8 Australia Naive   2025  N(21, 12)  21.3
# Depict the forecasts
forecasts %>%
  autoplot(aus_exports, level = FALSE) +
  
  # Overlays the fitted values
  geom_line(data = fitted_vals, aes(y = .fitted), colour = "blue", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 1 row containing missing values (`geom_line()`).

Unlike the fitted values, the forecasts can extend multiple steps ahead into the future.

Number of parameters

tidy(fit)
# A tibble: 0 × 7
# ℹ 7 variables: Country <fct>, .model <chr>, term <chr>, estimate <dbl>,
#   std.error <dbl>, statistic <dbl>, p.value <dbl>

As we can see, the Naïve model does not have any parameter. It just forecasts by extending the last observation.

Seasonal naïve

Each forecast equal to the last observed value from the same season (e.g. the same month of the previous year). Formally the forecast at time T+h:

\[\begin{align*} \hat{y}_{T+h|T} = y_{T+h-m(k+1)} \end{align*}\]

where:

  • \(m\) is the seasonal period
  • \(k\) is the integer part of \((h-1)/m\) (i.e, the number of complete years in the forecast period prior to time T + h)
    • Example: for monthly data (\(m=12\))
      • \(1 \leq h \leq 12\) (forecasts of up to one season ahead)
        • \(0 \leq h-1/m < 1 \rightarrow k = int(\frac{h-1}{m}) = 0\).
        • Since \(k=0 \rightarrow \hat{y}_{T+h|T}=y_{T+h-m(0+1)}=y_{T+h-m}\)
        • i.e.: \(\hat{y}_{T+h|T}=y_{T+h-m}\): the forecast is equal to the matching observation of the last available season in the historical data.
      • \(13 \leq h \leq 24\) (forecasts of up to two sessions ahead)
        • \(1 \leq h-1/m < 2 \rightarrow k = int(\frac{h-1}{m}) = 1\).
        • Since \(k=0 \rightarrow \hat{y}_{T+h|T}=y_{T+h-m(1+1)}=y_{T+h-2m}\)
        • i.e.: \(\hat{y}_{T+h|T}=y_{T+h-2m}\): the forecast is equal to the matching observation of the last available season in the historical data.
      • \(25 \leq h \leq 36\) (forecasts of up to three sessions ahead)
        • \(2 \leq h-1/m < 3 \rightarrow k = int(\frac{h-1}{m}) = 2\).
        • Since \(k=0 \rightarrow \hat{y}_{T+h|T}=y_{T+h-m(2+1)}=y_{T+h-3m}\)
        • i.e.: \(\hat{y}_{T+h|T}=y_{T+h-3m}\): the forecast is equal to the matching observation of the last available season in the historical data.
      • \(37 \leq h \leq 48\) (forecasts of up to four sessions ahead)

The interpretation of the formula is therefore quite simple. In the case of monthly data:

  • all future February values will be equal to the last observed February value.
  • all future March values will be equal to the last observed February value.

This will be better understood with the examples below.

Let us do an example with the us_employment series:

employed <- filter(us_employment, Title == "Total Private", Month >= yearmonth("01-01-2010"))
head(employed)
# A tsibble: 6 x 4 [1M]
# Key:       Series_ID [1]
     Month Series_ID     Title         Employed
     <mth> <chr>         <chr>            <dbl>
1 2010 Jan CEU0500000001 Total Private   105444
2 2010 Feb CEU0500000001 Total Private   105490
3 2010 Mar CEU0500000001 Total Private   106176
4 2010 Apr CEU0500000001 Total Private   107220
5 2010 May CEU0500000001 Total Private   107931
6 2010 Jun CEU0500000001 Total Private   108710
autoplot(employed)
Plot variable not specified, automatically selected `.vars = Employed`

Defining and training the model

fit <- employed %>% model(SNaive = SNAIVE(Employed))
fit
# A mable: 1 x 2
# Key:     Series_ID [1]
  Series_ID       SNaive
  <chr>          <model>
1 CEU0500000001 <SNAIVE>

NOTE: we may use the special function lag to specify the seasonal period the method is to consider. Most of the time this is not necessary, but in case there are multiple possible seasonal periods, you may want to specify which specifically you want SNAIVE() to consider.

# In this case it is not necessary to specify lag
fit <- employed %>% model(SNaive = SNAIVE(Employed ~ lag("year")))
fit
# A mable: 1 x 2
# Key:     Series_ID [1]
  Series_ID       SNaive
  <chr>          <model>
1 CEU0500000001 <SNAIVE>

Fitted values

Because the method generates forecasts by using the corresponding observation from the previous season, the model does not generate fitted values for the first season (remember fitted values are one-step ahead forecasts):

# Extract fitted values and inspect table
fitted_vals <- fit %>% augment()
head(fitted_vals) 
# A tsibble: 6 x 7 [1M]
# Key:       Series_ID, .model [1]
  Series_ID     .model    Month Employed .fitted .resid .innov
  <chr>         <chr>     <mth>    <dbl>   <dbl>  <dbl>  <dbl>
1 CEU0500000001 SNaive 2010 Jan   105444      NA     NA     NA
2 CEU0500000001 SNaive 2010 Feb   105490      NA     NA     NA
3 CEU0500000001 SNaive 2010 Mar   106176      NA     NA     NA
4 CEU0500000001 SNaive 2010 Apr   107220      NA     NA     NA
5 CEU0500000001 SNaive 2010 May   107931      NA     NA     NA
6 CEU0500000001 SNaive 2010 Jun   108710      NA     NA     NA
# Print fitted values along with the original series
fitted_vals %>% 
  filter(.model == "SNaive") %>%
  autoplot(Employed, colour = "gray") +
  geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")
Warning: Removed 12 rows containing missing values (`geom_line()`).

Forecasts

# produce forecasts for 36 timesteps
forecasts <- fit %>% forecast(h = 36)
forecasts
# A fable: 36 x 5 [1M]
# Key:     Series_ID, .model [1]
   Series_ID     .model    Month           Employed  .mean
   <chr>         <chr>     <mth>             <dist>  <dbl>
 1 CEU0500000001 SNaive 2019 Oct N(128001, 5538565) 128001
 2 CEU0500000001 SNaive 2019 Nov N(128415, 5538565) 128415
 3 CEU0500000001 SNaive 2019 Dec N(128363, 5538565) 128363
 4 CEU0500000001 SNaive 2020 Jan N(125932, 5538565) 125932
 5 CEU0500000001 SNaive 2020 Feb N(126370, 5538565) 126370
 6 CEU0500000001 SNaive 2020 Mar N(126994, 5538565) 126994
 7 CEU0500000001 SNaive 2020 Apr N(128007, 5538565) 128007
 8 CEU0500000001 SNaive 2020 May N(128771, 5538565) 128771
 9 CEU0500000001 SNaive 2020 Jun N(129800, 5538565) 129800
10 CEU0500000001 SNaive 2020 Jul N(129883, 5538565) 129883
# ℹ 26 more rows
# Depict the forecasts
forecasts %>%
  autoplot(employed, level = FALSE) +
  autolayer(fitted_vals, .fitted, colour = "blue", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 12 rows containing missing values (`geom_line()`).

Number of parameters

tidy(fit)
# A tibble: 0 × 7
# ℹ 7 variables: Series_ID <chr>, .model <chr>, term <chr>, estimate <dbl>,
#   std.error <dbl>, statistic <dbl>, p.value <dbl>

As we can see, the seasonal naïve model, just like the naïve model, does not have any parameters

Drift method

Variation of the naïve method. Allows forecasts to increase or decrease over time. Drift is the amount of change over time, set to be the average change in the historical data.

The drift method extrapolates this average change, estimated as a slope, into the future to compute \(\hat{y}_{T+h|T}\) simply multiplying this average change by \(h\). In other words, it creates a straight line covering the entire forecast horizon with a slope equal to the average change. Mathematically, it works as follows:

If \(\Delta_i y_t = y_i - y_{i-1}\) then the equation for the drift method may be written as:

\[ \begin{equation} \hat{y}_{T+h|T} = y_{T} + h \underbrace{\frac{\sum_{t=2}^T\Delta_i y_t}{T-1}}_{\substack{\text{average change of } y \\ \text{in the historical data}}} = y_{T} + \frac{h}{T-1}\sum_{t=2}^T (y_{t}-y_{t-1}) = y_{T} + h \left( \frac{y_{T} -y_{1}}{T-1}\right) \end{equation} \]

Defining and training the model

  • NOTE: RW stands for Random Walk. More about this later in the subject, in the context of ARIMA models.
fit <- employed %>% model(Drift = RW(Employed ~ drift()))
fit
# A mable: 1 x 2
# Key:     Series_ID [1]
  Series_ID             Drift
  <chr>               <model>
1 CEU0500000001 <RW w/ drift>

Equivalent to drawing a line between the first and last observations and extrapolating into the future (the final graph of the example clarifies this further).

Fitted values

# Extract fitted values and inspect table
fitted_vals <- fit %>% augment()
head(fitted_vals) 
# A tsibble: 6 x 7 [1M]
# Key:       Series_ID, .model [1]
  Series_ID     .model    Month Employed .fitted .resid .innov
  <chr>         <chr>     <mth>    <dbl>   <dbl>  <dbl>  <dbl>
1 CEU0500000001 Drift  2010 Jan   105444     NA     NA     NA 
2 CEU0500000001 Drift  2010 Feb   105490 105650.  -160.  -160.
3 CEU0500000001 Drift  2010 Mar   106176 105696.   480.   480.
4 CEU0500000001 Drift  2010 Apr   107220 106382.   838.   838.
5 CEU0500000001 Drift  2010 May   107931 107426.   505.   505.
6 CEU0500000001 Drift  2010 Jun   108710 108137.   573.   573.
# Print fitted values along with the original series
fitted_vals %>% 
  filter(.model == "Drift") %>%
  autoplot(Employed, colour = "gray") +
  geom_line(aes(y=.fitted), colour = "blue", linetype = "dashed")
Warning: Removed 1 row containing missing values (`geom_line()`).

The fitted values look very similar to the SNAIVE fitted values. But that is only because they are one step-ahead forecasts. Note that values for the first observations are produced (unlike for the seasoanl naïve) case.

Forecasts

forecasts <- fit %>% forecast(h=8)
forecasts
# A fable: 8 x 5 [1M]
# Key:     Series_ID, .model [1]
  Series_ID     .model    Month           Employed   .mean
  <chr>         <chr>     <mth>             <dist>   <dbl>
1 CEU0500000001 Drift  2019 Oct  N(129518, 764786) 129518.
2 CEU0500000001 Drift  2019 Nov N(129724, 1542645) 129724.
3 CEU0500000001 Drift  2019 Dec N(129929, 2333578) 129929.
4 CEU0500000001 Drift  2020 Jan N(130135, 3137584) 130135.
5 CEU0500000001 Drift  2020 Feb   N(130341, 4e+06) 130341.
6 CEU0500000001 Drift  2020 Mar N(130547, 4784816) 130547.
7 CEU0500000001 Drift  2020 Apr N(130752, 5628041) 130752.
8 CEU0500000001 Drift  2020 May N(130958, 6484340) 130958.
# Extract the initial and final points of the series
# to depict the average change slope.
n_rows = nrow(employed)
drift_points <- tibble(Month = c(employed$Month[1], employed$Month[n_rows]), 
                       Employed = c(employed$Employed[1], employed$Employed[n_rows])) %>%
                as_tsibble()
Using `Month` as index variable.
# Depict the forecasts
forecasts %>%
  autoplot(employed, level = FALSE) +
  autolayer(fitted_vals, .fitted, colour = "blue", linetype = "dashed") + 
  geom_line(drift_points, mapping = aes(x = Month, y = Employed), color = "red", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 1 row containing missing values (`geom_line()`).

In the above graph we can see that:

  1. The forecasts are generated following the slope of the line that connects the initial and final points of the series (red slope). A further example is depicted in the image below:

Further example of the drift method. From [1]
  1. The fitted values are very similar to those of the Naïve Method. Nonetheless, they are not the same. In this case each fitted value is generated by a one-step forecast using the drift method. That is, following the slope resulting from connecting each point of the series with the initial point and extending it one step into the future.

Number of parameters

We can look at the number of parameters in a model and their values using the function tidy(). If we apply it to the drift model we get:

tidy(fit)
# A tibble: 1 × 7
  Series_ID     .model term  estimate std.error statistic p.value
  <chr>         <chr>  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 CEU0500000001 Drift  b         206.      80.8      2.54  0.0123

The drift model has only one parameter, the slope of the line joining the first and the last observation.

Forecasting with decomposition

Recall that there where primarily two types of schemes in time series:

If the seasonal component is \(\hat{S_t}\), the seasonally adjusted component \(\hat{A_t}\) of the time series is what results from removing the effect of seasonality from the time series.

The seasonallly adjusted component is computed differently depending on whether the scheme is additive or multiplicative:

\[ \begin{align*} \text{For additive schemes} && \hat{A_t} = y_t - \hat{S_t} \\ \text{For multiplicative schemes} && \hat{A_t} = \frac{y_t}{\hat{S_t}} \\ \end{align*} \]

The decomposed series can be written as:

  • \(y_t = \hat{S}_t + \hat{A}_t\) for additive schemes
    • where \(\hat{A}_t = \hat{T}_t+\hat{R}_{t}\) is the seasonally adjusted component.
  • \(y_t = \hat{S}_t\hat{A}_t\) for multiplicative schemes
    • where \(\hat{A}_t = \hat{T}_t\hat{R}_{t}\) is the seasonally adjusted component.

To forecast a decomposed time series \(\hat{S_t}\) and \(\hat{A_t}\) are forecast separately:

  • \(\hat{A_t}\): any non-seasonal forecasting method may be used (drift method, Holt’s method, ARIMA)…
  • \(\hat{S_t}\): usually assumed to remain unchanged, or to change very slowly. Many times it is forecast by taking the observations of the last season, that is, using simply a seasonal naïve model.

Fitting the model

Let us again resort to the employed time series:

employed <- filter(us_employment, Title == "Total Private", Month >= yearmonth("01-01-2010"))
autoplot(employed)
Plot variable not specified, automatically selected `.vars = Employed`

To fully understand the model specification, let us first look at the outcome of an STL decomposition applied to this series

employed %>% 
  model(
    stl = STL(Employed)
  ) %>% 
  components()
# A dable: 117 x 8 [1M]
# Key:     Series_ID, .model [1]
# :        Employed = trend + season_year + remainder
   Series_ID .model    Month Employed  trend season_year remainder season_adjust
   <chr>     <chr>     <mth>    <dbl>  <dbl>       <dbl>     <dbl>         <dbl>
 1 CEU05000… stl    2010 Jan   105444 1.07e5      -2016.    272.         107460.
 2 CEU05000… stl    2010 Feb   105490 1.07e5      -1816.     -8.49       107306.
 3 CEU05000… stl    2010 Mar   106176 1.07e5      -1231.    -33.0        107407.
 4 CEU05000… stl    2010 Apr   107220 1.08e5       -361.     14.7        107581.
 5 CEU05000… stl    2010 May   107931 1.08e5        305.    -69.4        107626.
 6 CEU05000… stl    2010 Jun   108710 1.08e5       1006.   -121.         107704.
 7 CEU05000… stl    2010 Jul   108784 1.08e5        916.    -85.9        107868.
 8 CEU05000… stl    2010 Aug   108936 1.08e5        886.    -36.8        108050.
 9 CEU05000… stl    2010 Sep   108568 1.08e5        357.     -9.10       108211.
10 CEU05000… stl    2010 Oct   108984 1.08e5        648.    -17.2        108336.
# ℹ 107 more rows

The seasonal component is stored in a column named season_year and the seasonally adjusted component is stored in a column named season_adjust. To give a full specification to decomposition_model() we need to:

  • Specify a decomposition scheme
  • Provide a model for the seasonally adjusted component.
  • Provide a model for the seasonal component (by default the SNAIVE component is used, so this is not strictly necessary).
fit <- employed %>% 
  model(
    decomp = decomposition_model(
                # Specify the decomposition scheme to be used.
                STL(Employed),
                # Specify a model for the seasonally adjusted component (in this case, a drift).
                RW(season_adjust ~ drift()),
                # Specify a model for the seasonal component (unnecesary, since SNAIVE is the default).
                SNAIVE(season_year)
            )
  )

fit
# A mable: 1 x 2
# Key:     Series_ID [1]
  Series_ID                        decomp
  <chr>                           <model>
1 CEU0500000001 <STL decomposition model>

Fitted values

As usual, you can extract the fitted values using augment():

fit %>% augment()
# A tsibble: 117 x 7 [1M]
# Key:       Series_ID, .model [1]
   Series_ID     .model    Month Employed .fitted .resid .innov
   <chr>         <chr>     <mth>    <dbl>   <dbl>  <dbl>  <dbl>
 1 CEU0500000001 decomp 2010 Jan   105444      NA     NA     NA
 2 CEU0500000001 decomp 2010 Feb   105490      NA     NA     NA
 3 CEU0500000001 decomp 2010 Mar   106176      NA     NA     NA
 4 CEU0500000001 decomp 2010 Apr   107220      NA     NA     NA
 5 CEU0500000001 decomp 2010 May   107931      NA     NA     NA
 6 CEU0500000001 decomp 2010 Jun   108710      NA     NA     NA
 7 CEU0500000001 decomp 2010 Jul   108784      NA     NA     NA
 8 CEU0500000001 decomp 2010 Aug   108936      NA     NA     NA
 9 CEU0500000001 decomp 2010 Sep   108568      NA     NA     NA
10 CEU0500000001 decomp 2010 Oct   108984      NA     NA     NA
# ℹ 107 more rows

QUESTION: Why are the first rows of the fitted values NAs??

Because the model combines a drift model with a seasonal naive model, it needs
a full season to go by before it can actually start producing forecasts. 
Remember fitted values can be thought of as one-step ahead forecasts.

Forecasts

fit %>%
  forecast() %>%
  autoplot(employed, level = FALSE) +
  labs(y = "Number of people",
       title = "US retail employment")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.

The graph above shows that the model combines the SNAIVE + the DRIFT forecasts (this is equivalent to adding the slope of the drift model to the seasonal naive model).

Exercise 1

For this exercise we are going to use the aus_retail dataset. Remember you can always use ?aus_retail in the console to red the details abut the dataset.

The dataset contains a variety of series that that might be filtered using their specific series ID. We will be working the the following Series ID: A3349767W. The following code filters the series for you.

retail_series <- aus_retail %>%
  filter(`Series ID` == "A3349767W")

head(retail_series)
# A tsibble: 6 x 5 [1M]
# Key:       State, Industry [1]
  State              Industry                      `Series ID`    Month Turnover
  <chr>              <chr>                         <chr>          <mth>    <dbl>
1 Northern Territory Clothing, footwear and perso… A3349767W   1988 Apr      2.3
2 Northern Territory Clothing, footwear and perso… A3349767W   1988 May      2.9
3 Northern Territory Clothing, footwear and perso… A3349767W   1988 Jun      2.6
4 Northern Territory Clothing, footwear and perso… A3349767W   1988 Jul      2.8
5 Northern Territory Clothing, footwear and perso… A3349767W   1988 Aug      2.9
6 Northern Territory Clothing, footwear and perso… A3349767W   1988 Sep      3  

1. Use an STL decompostition to calculate the trend-cycle and seasonal indices. If you deem it necessary, adjust the trend and season windows (solved)

NOTE: remember that the STL decomposition is only applicable to additive schemes, which means you need to transform the data first to obtain an additive scheme. To pick the best transformation, I am going to use the guerrero feature, something we will see in the following session. For now take this analysis as a given (I leave the analysis so that you understand it later):

retail_series %>% autoplot()
Plot variable not specified, automatically selected `.vars = Turnover`

# let us examine the guerrero feature to see which box-cox transformation would
# be more appropriate.
lambda <- retail_series %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

lambda
[1] 0.08303631
# Given the guerrero feautre so close to 0, we pick up a log transformation
# Let us check the outcome of the transformation
retail_series <- retail_series %>% mutate(log_T = log(Turnover))
retail_series %>% autoplot(log(Turnover))

This menas that a log transformation should be applied. In this particular case we are not going to adjust the STL windows. We are going to specify the log transformation within the model formula. We will see later that this is important.

dcmp <- retail_series %>%
        model(stl = STL(log(Turnover))) %>%
        components()

dcmp %>% autoplot()

# ALTERNATIVELY, WE COULD HAVE USED THE TRANSFORMED VARIABLE LOG_T
# WE WILL SEE THAT THIS APPROACH IS LESS DESIRABLE, IT IS BEST TO SPECIFY THE
# TRANSFORMATION ON THE MODEL DEFINITION ITSELF, SO THAT THE FABLE LIBRARY
# IS AWARE THATTHERE IS ACTUALLY A TRANSFORMATION INVOLVED.
dcmp_2 <- retail_series %>%
        model(stl = STL(log_T)) %>%
        components()

dcmp_2 %>% autoplot()

Now, having this decomposition, proceed to carry out the rest of the exercise.

2. Extract the seasonally adjusted data and store it in a separate tsibble called season_adjust. Plot the seasonally adjusted data

season_adjust <- 
  dcmp %>% 
  select(season_adjust)
  season_adjust %>% autoplot()
Plot variable not specified, automatically selected `.vars = season_adjust`

3. Use the drift method to produce forcasts of the seasonally adjusted data. Generate a graph with the tieme series, the fitted values and the forecasts

# Fit the model
fit_drift <- 
  season_adjust %>% 
  model(
    Drift = RW(season_adjust ~ drift())
  )

fitted_vals_drift <- 
  fit_drift %>% 
  augment()

forecasts <- 
  fit_drift %>% forecast(h = 36)
  head(forecasts)
# A fable: 6 x 4 [1M]
# Key:     .model [1]
  .model    Month  season_adjust .mean
  <chr>     <mth>         <dist> <dbl>
1 Drift  2019 Jan N(2.6, 0.0037)  2.59
2 Drift  2019 Feb N(2.6, 0.0075)  2.60
3 Drift  2019 Mar  N(2.6, 0.011)  2.60
4 Drift  2019 Apr  N(2.6, 0.015)  2.60
5 Drift  2019 May  N(2.6, 0.019)  2.61
6 Drift  2019 Jun  N(2.6, 0.023)  2.61
forecasts %>% 
  autoplot(season_adjust, level = FALSE) +
  autolayer(fitted_vals_drift, .fitted, colour = "red", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 1 row containing missing values (`geom_line()`).

4. Extract the seasonal component and store it in a separate tsibble called seasonal. Use the SNAIVE method to produce forecasts of the seasonal component. Generate a graph with the time series, the fitted values and the forecasts

seasonal <-
  dcmp %>% 
  select(season_year)

seasonal
# A tsibble: 369 x 2 [1M]
   season_year    Month
         <dbl>    <mth>
 1    -0.131   1988 Apr
 2     0.00568 1988 May
 3     0.0443  1988 Jun
 4     0.177   1988 Jul
 5     0.103   1988 Aug
 6     0.0602  1988 Sep
 7     0.0525  1988 Oct
 8     0.00579 1988 Nov
 9     0.322   1988 Dec
10    -0.173   1989 Jan
# ℹ 359 more rows
fit_snaive <- 
  seasonal %>% 
  model(
    SNaive = SNAIVE(season_year)
    )

fit_snaive
# A mable: 1 x 1
    SNaive
   <model>
1 <SNAIVE>
fitted_vals_snaive <- 
  fit_snaive %>% 
  augment()

forecasts_snaive <- 
  fit_snaive %>% 
  forecast(h = 36)

forecasts_snaive %>%
  autoplot(seasonal, level = FALSE) +
  autolayer(fitted_vals_snaive, .fitted, colour = "red", linetype = "dashed")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 12 rows containing missing values (`geom_line()`).

5. Now use decomposition_model() on the original time seres. Use the drift method for the seasonally adjusted data and the seasonal naïve method for the seaxonal component.

OPTION 1: specify transformation within the model specification.
The forecasts will be automatically converted to the original variable.
This is the preferred option
fit_dcmp <- retail_series %>% 
  model(
    decomp = decomposition_model(
                # ARGUMENT 1: Specify the decomposition scheme to be used.
                # Note that you could even adjust the windows of trend and season
                # In this case, I left their default values but made the the
                # syntax explicit to be aware that you may adjust them.
                STL(log(Turnover) ~ trend(window = 21) + season(window = 13)),
                
                # ARGUMENT 2: Specify a model for the seasonally adjusted 
                # component (in this case, a drift).
                RW(season_adjust ~ drift()),
                
                # ARGUMENT 3 (optional, defaults to SNAIVE).
                # Specify a model for the seasonal component (redundant here, 
                # since SNAIVE is the default).
                SNAIVE(season_year)
            )
  )

fit_dcmp
# A mable: 1 x 3
# Key:     State, Industry [1]
  State              Industry                                             decomp
  <chr>              <chr>                                               <model>
1 Northern Territory Clothing, footwear and personal … <STL decomposition model>
fitted_vals <- fit_dcmp %>% augment()
fitted_vals
# A tsibble: 369 x 8 [1M]
# Key:       State, Industry, .model [1]
   State              Industry    .model    Month Turnover .fitted .resid .innov
   <chr>              <chr>       <chr>     <mth>    <dbl>   <dbl>  <dbl>  <dbl>
 1 Northern Territory Clothing, … decomp 1988 Apr      2.3      NA     NA     NA
 2 Northern Territory Clothing, … decomp 1988 May      2.9      NA     NA     NA
 3 Northern Territory Clothing, … decomp 1988 Jun      2.6      NA     NA     NA
 4 Northern Territory Clothing, … decomp 1988 Jul      2.8      NA     NA     NA
 5 Northern Territory Clothing, … decomp 1988 Aug      2.9      NA     NA     NA
 6 Northern Territory Clothing, … decomp 1988 Sep      3        NA     NA     NA
 7 Northern Territory Clothing, … decomp 1988 Oct      3.1      NA     NA     NA
 8 Northern Territory Clothing, … decomp 1988 Nov      3        NA     NA     NA
 9 Northern Territory Clothing, … decomp 1988 Dec      4.2      NA     NA     NA
10 Northern Territory Clothing, … decomp 1989 Jan      2.7      NA     NA     NA
# ℹ 359 more rows
fit_dcmp %>%
  forecast(h = 36) %>%
  autoplot(retail_series, level = FALSE)
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.

OPTION 2: specify transformation on a separate column (create new variable)
This works on the transformed domain and does not automatically back-transform
the forecasts.

OPTION 1 is preferred because it automatically converts the forecasts back
log_T_series <- select(retail_series, log_T)

fit_dcmp_2 <- log_T_series %>%
  model(stlf = decomposition_model(
        STL(log_T), # Details how to decompose the series
        RW(season_adjust ~ drift()) # Details model for the seasonally adjusted component
  ))

fitted_vals_2 <- fit_dcmp_2 %>% augment()

fit_dcmp_2 %>%
  forecast(h = 36) %>%
  autoplot(log_T_series, level = FALSE) +
  autolayer(fitted_vals_2, .fitted, colour = "red", linetype = "dashed") +
  labs(y = "log_T",
       title = "")
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Warning: Removed 12 rows containing missing values (`geom_line()`).

# Let us show the original time series to see the difference between the original
# and the transformed domain
fit_dcmp_2 %>%
  forecast(h = 36) %>%
  autoplot(log_T_series, level = FALSE) +
  autolayer(fitted_vals_2, .fitted, colour = "red", linetype = "dashed") +
  labs(y = "log_T",
       title = "") +
  
  autolayer(retail_series)
Warning: Plot argument `level` should be a numeric vector of levels to display.
Setting `level = NULL` will remove the intervals from the plot.
Plot variable not specified, automatically selected `.vars = Turnover`
Warning: Removed 12 rows containing missing values (`geom_line()`).